This notebook will walk you through implementing a custom iterator for a modified version of the Street View House Number (SVHN) dataset. You will then design a network to train on this dataset.
This dataset is a collection of 73,257 images of house numbers collected from Google Streetview. The original dataset has bounding boxes for all the digits in the image:
We have modified the dataset such that each image is 64x64 pixels (with 3 color channels), and the target is a single bounding box over all the digits. Your goal is to build a network that, given an image, returns bounding box coordinates for the location of the digit sequence.
This notebook is split into two parts:
Because the training set of ~27,000 images can fit into the memory of a single Titan X GPU, we could use the ArrayIterator
class to provide data to the model. However, when the dataset may have more images or larger image sizes, that is no longer an option. Our high-performance DataLoader
, which loads image in batches and performs complex augmentation, cannot currently handle bounding box data (stay tuned, an object localization dataloader is coming in a future neon release!).
We've saved the dataset as a pickle file svhn_64.p
. This file has a few variables:
X_train
: a numpy array of shape (num_examples, num_features)
, where num_examples = 26624
, and num_features = 3*64*64 = 12288
y_train
: a numpy array of shape (num_examples, 4)
, with the target bounding box coordinates in (x_min, y_min, w, h)
format.X_test
: a numpy array of shape (3328, 12288)
y_test
: a numpy array of shape (3328, 4)
Let's first import our backend:
In [ ]:
from neon.backends import gen_backend
be = gen_backend(batch_size=128, backend='gpu')
# set the debug level to 10 (the minimum)
# to see all the output
import logging
main_logger = logging.getLogger('neon')
main_logger.setLevel(10)
The modified SVHN dataset can be found at: https://s3-us-west-1.amazonaws.com/nervana-course/svhn_64.p. Place this dataset into the data/ folder and we can then load the pickle file with our SVHN dataset.
In [ ]:
import cPickle
fileName = 'data/svhn_64.p'
print("Loading {}...".format(fileName))
with open(fileName) as f:
svhn = cPickle.load(f)
Below is a skeleton of the SVHN data iterator for you to fill out, with notes to help along the way. The goal is an object that returns, with each call, a tuple of (X, Y)
for the input and the target bounding boxes.
In [ ]:
# import some useful packages
from neon.data import NervanaDataIterator
import numpy as np
import cPickle
import os
class SVHN(NervanaDataIterator):
def __init__(self, X, Y, lshape):
# Load the numpy data into some variables. We divide the image by 255 to normalize the values
# between 0 and 1.
self.X = X / 255.
self.Y = Y
self.shape = lshape # shape of the input data (e.g. for images, (C, H, W))
# 1. assign some required and useful attributes
self.start = 0 # start at zero
self.ndata = ... # number of images in X (hint: use X.shape)
self.nfeatures = ... # number of features in X (hint: use X.shape)
# number of minibatches per epoch
# to calculate this, use the batchsize, which is stored in self.be.bsz
self.nbatches = ...
# 2. allocate memory on the GPU for a minibatch's worth of data.
# (e.g. use `self.be` to access the backend.). See the backend documentation.
# to get the minibatch size, use self.be.bsz
# hint: X should have shape (# features, mini-batch size)
# hint: use some of the attributes previously defined above
self.dev_X = ...
self.dev_Y = ...
def reset(self):
self.start = 0
def __iter__(self):
# 3. loop through minibatches in the dataset
for index in range(self.start, self.ndata, self.be.bsz):
# 3a. grab the right slice from the numpy arrays
inputs = ...
targets = ...
# The arrays X and Y data are in shape (batch_size, num_features),
# but the iterator needs to return data with shape (num_features, batch_size).
# here we transpose the data, and then store it as a contiguous array.
# numpy arrays need to be contiguous before being loaded onto the GPU.
inputs = np.ascontiguousarray(inputs.T)
targets = np.ascontiguousarray(targets.T)
# here we test your implementation
# your slice has to have the same shape as the GPU tensors you allocated
assert inputs.shape == self.dev_X.shape, \
"inputs has shape {}, but dev_X is {}".format(inputs.shape, self.dev_X.shape)
assert targets.shape == self.dev_Y.shape, \
"targets has shape {}, but dev_Y is {}".format(targets.shape, self.dev_Y.shape)
# 3b. transfer from numpy arrays to device
# - use the GPU memory buffers allocated previously,
# and call the myTensorBuffer.set() function.
self.dev_X ...
self.dev_Y ...
# 3c. yield a tuple of the device tensors.
# the first should of shape (num_features, batch_size)
# the second should of shape (4, batch_size)
yield (..., ...)
Check your implementation! Below we grab an iteration and print out the output of the dataset. Importantly: make sure that the output tensors are contiguous (e.g. is_contiguous = True
in the output below). This means that they are allocated on a contiguous set of memory, which is important for the downstream calculations. Contiguity can be broken by operations like transpose.
In [ ]:
# setup datasets
train_set = SVHN(X=svhn['X_train'], Y=svhn['y_train'], lshape=(3, 64, 64))
# grab one iteration from the train_set
iterator = train_set.__iter__()
(X, Y) = iterator.next()
print X # this should be shape (12288, 128)
print Y # this should be shape (4, 128)
assert X.is_contiguous
assert Y.is_contiguous
If all goes well, you are ready to try training on this network! First, let's reset the dataset to zero (since you drew one example from above). We also add a test set for evaluation.
In [ ]:
train_set.reset()
# generate test set
test_set = SVHN(X=svhn['X_test'], Y=svhn['y_test'], lshape=(3, 64, 64))
We recommend using a VGG-style convolutional neural network to train this model, using the ConvNet Design Philosophy we introduced earlier. We've imported some relevant packages that you may want to use, and have some guiding steps for implementing your network. Experiment with networks of different sizes!
Some tips:
In [ ]:
from neon.callbacks.callbacks import Callbacks
from neon.initializers import Gaussian
from neon.layers import GeneralizedCost, Affine, Conv, Pooling, Linear, Dropout
from neon.models import Model
from neon.optimizers import GradientDescentMomentum, RMSProp
from neon.transforms import Rectlin, Logistic, CrossEntropyMulti, Misclassification, SumSquared
# set up weight initializer
...
# set up model layers
layers = []
layers.append(....)
# the last layer should be a linear layer with nout=4, for the 4 coordinates of the bounding box.
layers.append(Linear(nout=4, init=Gaussian(loc=0.0, scale=0.01)))
# use SumSquared cost
cost = GeneralizedCost(costfunc=SumSquared())
# setup optimizer
optimizer = RMSProp()
# initialize model object
mlp = Model(layers=layers)
# configure callbacks
callbacks = Callbacks(mlp, eval_set=test_set, eval_freq=1)
# run fit
mlp.fit(train_set, optimizer=optimizer, num_epochs=10, cost=cost, callbacks=callbacks)
Below we plot the cost data over time to help you visualize the training progress. This is similiar to using the nvis
command line tool to generate plots.
In [ ]:
from neon.visualizations.figure import cost_fig, hist_fig, deconv_summary_page
from neon.visualizations.data import h5_cost_data, h5_hist_data, h5_deconv_data
from bokeh.plotting import output_notebook, show
cost_data = h5_cost_data('data.h5', False)
output_notebook()
show(cost_fig(cost_data, 300, 600, epoch_axis=False))
To understand how the network performed, we sample images and plot the network's predicted bounding box against the ground truth bounding box. We evaluate this on the test_set
, which was not used to train the network.
In [ ]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
# get a minibatch's worth of
# inputs (X) and targets (T)
iterator = test_set.__iter__()
(X, T) = iterator.next()
# fprop the input to get the model output
y = mlp.fprop(X)
# transfer from device to numpy arrays
y = y.get()
T = T.get()
Our ground truth box T
and the model prediction y
are both arrays of size (4, batch_size)
. We can plot an image below. Feel free to modify i
to check performance on various test images. Red boxes are the model's guess, and blue boxes are the ground truth boxes.
In [ ]:
plt.figure(2)
imgs_to_plot = [0, 1, 2, 3]
for i in imgs_to_plot:
plt.subplot(2, 2, i+1)
title = "test {}".format(i)
plt.imshow(X.get()[:, i].reshape(3, 64, 64).transpose(1, 2, 0))
ax = plt.gca()
ax.add_patch(plt.Rectangle((y[0,i], y[1,i]), y[2,i], y[3,i], fill=False, edgecolor="red")) # model guess
ax.add_patch(plt.Rectangle((T[0,i], T[1,i]), T[2,i], T[3,i], fill=False, edgecolor="blue")) # ground truth
plt.title(title)
plt.axis('off')
In [ ]:
i=0
print "Target box had coordinates: {}".format(T[:,i])
print "Model prediction has coordinates: {}".format(y[:, i])